Introduction

In an upcoming analysis, we want to calculate the structural similarity between test cases. For this, we need the information which test methods call which code in the application (the "production code").

In this blog post, I'll show how you can get this information by using jQAssistant for a Java application. With jQAssistant, you can scan the structural information of your software. I'll also explain the relevant database query that delivers the information we need later on.

Dataset

I've scanned a small pet project of mine called "DropOver" that was originally developed as a web application for organizing parties or bar-hoppings. I've just added jQAssistant as a Maven plugin to my project's Maven build (see here for a mini tutorial). The structures of this application are stored by jQAssistant in a property graph within the graph database Neo4j. A subgraph with the structural information that's relevant for our purposes looks like this:

We can see the scanned software entities like Java types (red) or methods (blue) as well their relationships with each other. We can now explore the database's content with the included Neo4j browser frontend or access the data with a programming language. I use Python (the programming language we'll write our analysis later on) with the py2neo module (the bridge between Python and Neo4j). The information we need can be retrieved by creating and executing a Cypher query (explained in the following) – Neo4j's language for accessing information in the property graph.

Last, we store the results in a Pandas DataFrame named invocations for a nice tabular representation of the outputs and for further analysis.


In [1]:
import py2neo
import pandas as pd

graph = py2neo.Graph()

query = """
MATCH 
  (testMethod:Method)
    -[:ANNOTATED_BY]->()-[:OF_TYPE]->
      (:Type {fqn:"org.junit.Test"}),
  (testType:Type)-[:DECLARES]->(testMethod),
  (type)-[:DECLARES]->(method:Method),
  (testMethod)-[i:INVOKES]->(method)
WHERE
  NOT type.name ENDS WITH "Test" 
  AND type.fqn STARTS WITH "at.dropover"
  AND NOT method.signature CONTAINS "<init>"
RETURN 
  testType.name as test_type,
  testMethod.signature as test_method,
  type.name as prod_type,
  method.signature as prod_method,
  COUNT(DISTINCT i) as invocations
ORDER BY 
  test_type, test_method, prod_type, prod_method
"""

invocations = pd.DataFrame(graph.data(query))
# reverse sort columns for better representation
invocations = invocations[invocations.columns[::-1]]
invocations.head()


Out[1]:
test_type test_method prod_type prod_method invocations
0 AddCommentTest void blankSiteContainsRightComment() AddComment at.dropover.comment.boundary.GetCommentRespons... 1
1 AddCommentTest void blankSiteContainsRightCreationTime() AddComment at.dropover.comment.boundary.GetCommentRespons... 1
2 AddCommentTest void blankSiteContainsRightUser() AddComment at.dropover.comment.boundary.GetCommentRespons... 1
3 AddCommentTest void failsAtCommentNull() AddComment at.dropover.comment.boundary.GetCommentRespons... 1
4 AddCommentTest void failsAtCreatorNull() AddComment at.dropover.comment.boundary.GetCommentRespons... 1

Cypher query explained

Let's go through that query from above step by step. The Cypher query that finds all test methods that call methods of our production types works as follows:

In the MATCH clause, we start our search for particular structural information. We first identify all test methods. These are methods that are annotated by @Test, which is an annotation that the JUnit4 framework provides.

MATCH
  (testMethod:Method)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(:Type {fqn:"org.junit.Test"})

Next, we find all the test classes that declare (via the DECLARES relationship type) all test methods from above.

(testType:Type)-[:DECLARES]->(testMethod)

With the same approach, we first identify all the Java types and methods (at first regardless of their meaning. Later, we'll define them as production types and methods).

(type)-[:DECLARES]->(method:Method)

Last, we find test methods that call methods of the other methods by querying the appropriate INVOKES relationship.

(testMethod)-[i:INVOKES]->(method)

In the WHERE clause, we define what we see as production type (and thus implicitly production method). We achieve this by saying that a production type is not a test and that the types must be within our application. These are all types that start with the fqn (full qualified name) at.dropover. We also filter out any calls to constructors, because those are irrelevant for our analysis.

WHERE
  NOT type.name ENDS WITH "Test" 
  AND type.fqn STARTS WITH "at.dropover"
  AND NOT method.signature CONTAINS "<init>"

In the RETURN clause, we just return the information needed for further analysis. These are all names of our test and production types as well as the signatures of the test methods and production methods. We also count the number of calls from the test methods to the production methods. This is a nice indicator for the cohesion of a test method to a production method.

RETURN
  testType.name as test_type,
  testMethod.signature as test_method,
  type.name as prod_type,
  method.signature as prod_method,
  COUNT(DISTINCT i) as invocations

In the ORDER BY clause, we simply order the results in a useful way (and for reproducible results):

ORDER BY
  test_type, test_method, prod_type, prod_method

A long explanation, but if you are familiar with Cypher and the underlying schema of your graph, you write those queries within half a minute.

Data export

Because we need that data in a follow-up analysis, we store the information in a semicolon-separated file.


In [2]:
invocations.to_csv("datasets/test_code_invocations.csv", sep=";", index=False)

Conclusion

This post was just the prelude for more in-depth analysis for structural test case similarity. We quickly got the information about which test method calls which production method. Albeit its a pure static (or structural) view of our code, it delivers valuable insights in further analysis.

Stay tuned!